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What is SKYNET? 



DEMONSPIT Data Flow 
Automated Bulk Cloud Analytics 
Analytic Triage 
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• Collaborative cloud research effort between 5 different 
organizations crossing 3 NSA Directorates: 

— Signals Intelligence: S2I, S22, SSG 

— Research: R6 

- Technology: T12, T14 

• Partnerships 

- TMAC/FASTSCOPE 

— MIT Lincoln Labs & Harvard 

• SKYNET applies complex combinations of geospatial, 
geotemporal, pattern-of-iife, and travel analytics to 
bulk DNR data to identify patterns of suspect activity 
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Rough outline of courier 
path as described by the 
targets 






) 



Tn.t ad* 

Ra-v,l!pirvdi 



Sunday 







1 



Probably Faisalabad 



F a sal abaci 



n 




Lahore 



Sunday/Monday 



3 



TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL 



/ 

V 



4 I 



\2 

























: El?/GQEVriNT//ORCON/REL TO ‘USA, *AU! 



Who has traveled from Peshawar to Faisalabad or 
Lahore (and back) in the past month? 

Who does the traveler call when he arrives? 

• Who else is seen in the area when the traveler arrives, and 
who seen leaving the area shortly afterward? 

Who travels to/from Peshawar every other Sunday and 
"cnmewhere else" on a weekly basis? 

vvno visits Akora Khattak periodically and also tr~ ‘ 
between Peshawar and Lahore? 

Who fits the above travel profiles and also posses- 1 " ~ 
unusual behavior: 

• One or two hops from other suspects or known tasked 
selectors 

• Frequent handset swapping or powering down 
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DEMONSPIT is a new dataflow for bulk Call Data Records (CDRs) from 
Pakistan 

- CDRs are being acquired from major PK Telecom providers 

Data is normalized through TUSKATTIRE, like all other Call Data Records 
DEMONSPIT data is forwarded by TUSKATTIRE to several Clouds: 

- GMHalo/DPS 

• Promotes records to FASCIA and feeds the SEDB Tower QFD 

- GMPlace& Cloud 14 

• Ingests DEMONSPIT into Sortinglead summaries to support SKYNET 
Analytics 

• Ingests DEMONSPIT into a Perishable QFD which will be available to 
analysts via JEMA and CINEPLEX 

- Bulldozer/MDR2 
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Access to DEMONSPIT FASCIA Promoted Data 
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SKYNET & Analyst 
Promoted CDRs 
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Access to CDRs, Analyst Queries, 
& Results of SKYNET Analytics 



Access to ALL DEMONSPIT Data 
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What is SKYNET? 

DEMONSPIT Data Flow 
Automated Bulk Cloud Analytics 
Analytic Triage 
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Travel Patterns 

- Travel phrases (Locations visited in given timeframe) 

- Regular/repeated visits to locations of interest 
Behavior-Based Analytics 

- Low use, incoming calls only 

- Excessive SIM or Handset swapping 

- Frequent Detach/Power-down 

- Courier machine learning models 

Other Enrichments 

• Travel on particular days of the week 

• Co-travelers 

• Similar travel patterns 

• Common contacts 

• Visits to airports 

• Other countries 

• Overnight trips 

• Permanent move 
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Traveling Between Peshawar and Lahore? 



Case-Specif^Behavioral Cloud Analytics Peshawar-Lahore Travel 1-4 NOV 2011 



TASKED NUM SELECTOR ASSOCIATED ACTIVITY 



TRAVEL PHRASE |dOW MSISDN 


(MSI CONTACTS .SWAPPING SELECTORS CATEGORIES 


torkham AF PK 
peshawar Eahore FR1 


2 


PK peshawar lahore THU 




behsud AF jaEalabad 
]a!al_abad Jalalabad 
behsud rodat bati_kot 
mohmand_darah 
peshawar PK WED 




gtrd PK nowshera 
gutbahar peshawar 
sanda_kalan lahore THU 




jarnrud PK peshawar 
Eahore TUE 
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PK peshawar lahore THU 


5-or-fewer- 
contacts, sms- 
and-zero- 
duration-callS’ 

BIHHHRRIHi only, low-use 
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What is SKYNET? 

DEMONSPIT Data Flow 
Automated Bulk Cloud Analytics 

Analytic triage 
-SMARTTRACKER 
- RT-RG 
-JEMA 
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Examine travel patterns for common routes and 
meeting locations 

— Run cell soaks on all common meeting locations 
during meeting timeframe 

Analyze selectors for common contacts 
Analyze selectors for handset sharing behavior 



Repeat procedure with resulting selectors 
Correlate with other known and suspected selectors 
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Coincidence Count 



1 at 1 location 



Sets ’-vith 2 .targets 



101 at 16 locations 



Select 
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39 at 24 locations 
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3' at 12 locations 
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Select 
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Meetings - who is at the same ucellid at the 
same time as the potential courier at the 
destination city?. ..Multiple times. 



Sidekicks - is there a pair traveling together to the 

destination city? 
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Start/end points 



Human in the loop 
to analyze travel 
reports. 



Destination Cities 



Evaluate, 
add value 
prioritize 



Are selectors seen meeting at 



Does Sidekick selector have 
call events? 



destination consistently? 
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Movement 

Irregularity 



Travel Reports 



Meetings 







m 




Sidekicks 




L ___ 


Jf 









.ECW77 



SKYNET WIKI 



https:/. 



UNCLASSIFIED//FOUO 



@ nsa.ic.gov 

■ 

@nsa. ic.gov 



UNCLASSIFED//F0U0 







J jC- 1 











TOP SECRETOCOMINT^REL TO USA, FVEY 



SKYNET: 

Courier Detection via Machine Learning 
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June 5, 2012 
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Given a handful of courier selectors, can we find others 
that “behave similarly” by analyzing GSM metadata? 




It’s worth noting that: 

• we are looking for 
different people using 
phones in similar ways 

• without using any call 
chaining techniques 
from known selectors 

• by scanning through 
all selectors seen in 
Pakistan that have not 
left Af/Pak (~55M) 
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From GS M metadata, we can measure as pec ts of each 
selector’s 
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O Hiram Shah, North Vtansian 
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This presentation describes our search for 
AQSL couriers using behavioral profiling 




Behavioral Feature Extraction 



Cross Validation Experiment 
on AQSL Couriers 





Preliminary SIGINT Findings 



TOP SECRET//COMINT//REL TO USA, FVEY 




TOP SECRET//COMINT//REL TO USA, FVEY 

Counting unique UCELLIDs shows that couriers 
travel more often than typical Pakistani selectors 




Group 




AQSL Local Comms 
AQSL Remote Comms 



Seen in Pakistan 



TOP SECRET//COMINT//REL TO USA, FVEY 



TOP SECRET//COMINT//REL TO USA, FVEY 

By examining multiple features at once, we can see some 
indicative behaviors of our courier selectors 
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Looking at a hierarchical clustering derived from all 
80 features, the AQSL groups mostly stay together 
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AGSL Remote Comms 
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Now, we’ll describe a cross validation experiment 
on the AQSL selectors that we were provided 



Behavioral Feature Extraction 



Cross Validation Experiment 
on AQSL Couriers 



Preliminary SIGINT Findings 




TOP SECRET//COMINT//REL TO USA, FVEY 



TOP SECRET//COMINT//REL TO USA, FVEY 

Our initial detector uses the centroid of the AQSL 
couriers to “find other selectors like these” 



AQSL Cross-Validation 

Experiment 

• 7 MSISDN/IMSI pairs 

• Hold each pair out 
and score them when 
training the centroid 
on the rest 

• Assume that random 
draws of Pakistani 
selectors are 
nontargets 

• How well do we do? 
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Our initial detector uses the centroid of the AQSL 
couriers to “find other selectors like these” 



AQSL Cross-Validation 

Experiment 

• Initial experiments 
showed EER in 
10-20% range 

• Here, performance is 
much worse again st 
these nontargets: 

• Seen in Pakistan 

• Not seen outside of 
Af/Pak 

• Not FVEY selectors 




false alarm probability (%) 
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TOP SECRET//COMINT//REL TO USA, FVEY 

Statistical algorithms are able to find the couriers at very 
low false alarm rates, if we’re allowed to miss half of them 



Random Forest 
Classifier 

• 7 MSISDN/IMSI pairs 

• Hold each pair out and 
then try to find them after 
learning how to distinguish 
remaining couriers fro n 
other Pakistanis 

(using 100k random selectors here) 

• Assume that random 
draws of Pakistani 
selectors are nontargets 

• 0.18% False Alarm Rate at 
50% Miss Rate 
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Centroid(AII Raw Features) 

Centroid(AII Normalized Features) 
Centroid(Outgoing Raw Features) 
Centroid{Outgoing Normalized Features) 
Random Forest(A!l Raw Features) 
Random Forest(Outgoing Raw Features) 



1 e-04 0.01 0.1 



false alarm probability (%) 
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We’ve been experimenting with several error 
metrics on both small and large test sets 



Training Data 


Classifier 


Features 


100k Test Selectors 


55M Test Selectors 


False Alarm 
Rate at 50% 
Miss Rate 


Mean 

Reciprocal 

Rank 


Tasked 
Selectors in 
Top 500 


Tasked 
Selectors in 
Top 100 


None 


Random 


None 


50% 


1/23 k 
(simulated) 


0.64 

(active/Pak) 


0.13 

(active/Pak) 


Known 

Couriers 


Centroid 


All 


20% 


l/18k 






Outgoing 


43% 


l/27k 


Random 

Forest 


0.18% 


1/9.9 


5 


1 


+ Anchory 
Selectors 











Random Forest: 

• 0.18% false alarm rate at 50% miss rate 

• 7x improvement over random performance when 
evaluating its tasked precision at 100 
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To get more training data we scraped selectors from S2I11 
Anchory reports containing keyword “courier” 



Anchory Selectors 

• Searched for reports 
containing “S2I11” 

AND “courier” 

• Filtered out non-mobile 
numbers and kept 
selectors with 
“interesting” travel 
patterns seen in 
SmartTracker 
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TOP SECRET//COMINT//REL TO USA, FVEY 

Adding selectors from Anchory reports to the training data 
reduced the false alarm rates even further 



Anchory Selectors 

• Searched for reports 
containing “S2I11” 

AND “courier” 

• Filtered out non-mobile 
numbers and kept 
selectors with 
“interesting” travel 
patterns seen in 
SmartTracker 
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Centroid(AII Raw Features) 

Centroid(AII Normalized Features) 
Centroid(Outgoing Raw Features) 
Centroid{Outgoing Normalized Features) 
Random Forest(A!l Raw Features) 
Random Forest(Outgoing Raw Features) 
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false alarm probability (%) 
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We’ve been experimenting with several error 
metrics on both small and large test sets 



Training Data 


Classifier 


Features 


100k Test Selectors 


55M Test Selectors 


False Alarm 
Rate at 50% 
Miss Rate 


Mean 

Reciprocal 

Rank 


Tasked 
Selectors in 
Top 500 


Tasked 
Selectors in 
Top 100 


None 


Random 


None 


50% 


1/23 k 
(simulated) 


0.64 

(active/Pak) 


0.13 

(active/Pak) 


Known 

Couriers 


Centroid 


All 


20% 


l/18k 






Outgoing 


43% 


l/27k 


Random 

Forest 


0.18% 


1/9.9 


5 


1 


+ Anchory 
Selectors 


0.008% 


1/14 


21 


6 



Random Forest trained on Known Couriers + Anchory Selectors: 

• 0.008% false alarm rate at 50% miss rate 

• 46x improvement over random performance when 
evaluating its tasked precision at 100 
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TOP SECRET//COMINT//REL TO USA, FVEY 

Now, we’ll investigate some findings after running these 
classifiers on +55M Pakistani selectors via MapReduce 




Preliminary SIGINT Findings 
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The highest scoring selector that traveled to 
Peshawar and Lahore is PROB AHMED ZAIDAN 
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TOP SECRET//COMINT//REL TO USA, FVEY 

In the top 500 scoring selectors, 21 are tasked 
leading us to believe that we’re on the right track 
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We have also discovered many untasked 
selectors with interesting travel patterns 
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Preliminary results indicate that we’re on the 
right track, but much remains to oe aone 



Cross Validation Experiment: 

- Random Forest classifier operating at 
0.18% false alarm rate at 50% miss 

- Enhancing training data with Anchory 
selectors reduced that to 0.008% 

- Mean Reciprocal Rank is -1/10 
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Cenlroid(AII Raw Features) 

CenlroidfAII Normalized Features) 

- — Centroid (Outgoing Raw Foal ores) 

— Centroid (Outgoing Normalized Features) 
Random Forestall Raw Features) 

— Random Forest(Ouigoiog Raw Features) 



false alarm probability (%) 



Preliminary SIGINT Findings: 

- Behavioral features helped discover 
similar selectors with “courier-like” 
travel patterns 

- High number of tasked selectors at 
the top is hopefully indicative of the 
detector performing well “in the wild” 
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